IEICE global.ieice.org Site

Keyword Search Result

[Keyword] neural net(879hit)

221-240hit(879hit)

A Spectral Clustering Based Filter-Level Pruning Method for Convolutional Neural Networks
Lianqiang LI Jie ZHU Ming-Ting SUN

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/09/17
Vol:
E102-D No:12
Page(s):
2624-2627
Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.
SDChannelNets: Extremely Small and Efficient Convolutional Neural Networks
JianNan ZHANG JiJun ZHOU JianFeng WU ShengYing YANG

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2019/09/10
Vol:
E102-D No:12
Page(s):
2646-2650
Convolutional neural networks (CNNS) have a strong ability to understand and judge images. However, the enormous parameters and computation of CNNS have limited its application in resource-limited devices. In this letter, we used the idea of parameter sharing and dense connection to compress the parameters in the convolution kernel channel direction, thus greatly reducing the number of model parameters. On this basis, we designed Shared and Dense Channel-wise Convolutional Networks (SDChannelNets), mainly composed of Depth-wise Separable SD-Channel-wise Convolution layer. The advantage of SDChannelNets is that the number of model parameters is greatly reduced without or with little loss of accuracy. We also introduced a hyperparameter that can effectively balance the number of parameters and the accuracy of a model. We evaluated the model proposed by us through two popular image recognition tasks (CIFAR-10 and CIFAR-100). The results showed that SDChannelNets had similar accuracy to other CNNs, but the number of parameters was greatly reduced.
Acoustic Design Support System of Compact Enclosure for Smartphone Using Deep Neural Network
Kai NAKAMURA Kenta IWAI Yoshinobu KAJIKAWA

PAPER-Engineering Acoustics

Vol:
E102-A No:12
Page(s):
1932-1939
In this paper, we propose an automatic design support system for compact acoustic devices such as microspeakers inside smartphones. The proposed design support system outputs the dimensions of compact acoustic devices with the desired acoustic characteristic. This system uses a deep neural network (DNN) to obtain the relationship between the frequency characteristic of the compact acoustic device and its dimensions. The training data are generated by the acoustic finite-difference time-domain (FDTD) method so that many training data can be easily obtained. We demonstrate the effectiveness of the proposed system through some comparisons between desired and designed frequency characteristics.
High Noise Tolerant R-Peak Detection Method Based on Deep Convolution Neural Network
Menghan JIA Feiteng LI Zhijian CHEN Xiaoyan XIANG Xiaolang YAN

LETTER-Biological Engineering

Pubricized:
2019/08/02
Vol:
E102-D No:11
Page(s):
2272-2275
An R-peak detection method with a high noise tolerance is presented in this paper. This method utilizes a customized deep convolution neural network (DCNN) to extract morphological and temporal features from sliced electrocardiogram (ECG) signals. The proposed network adopts multiple parallel dilated convolution layers to analyze features from diverse fields of view. A sliding window slices the original ECG signals into segments, and then the network calculates one segment at a time and outputs every point's probability of belonging to the R-peak regions. After a binarization and a deburring operation, the occurrence time of the R-peaks can be located. Experimental results based on the MIT-BIH database show that the R-peak detection accuracies can be significantly improved under high intensity of the electrode motion artifact or muscle artifact noise, which reveals a higher performance than state-of-the-art methods.
Discriminative Convolutional Neural Network for Image Quality Assessment with Fixed Convolution Filters
Motohiro TAKAGI Akito SAKURAI Masafumi HAGIWARA

LETTER-Image Recognition, Computer Vision

Pubricized:
2019/08/09
Vol:
E102-D No:11
Page(s):
2265-2266
Current image quality assessment (IQA) methods require the original images for evaluation. However, recently, IQA methods that use machine learning have been proposed. These methods learn the relationship between the distorted image and the image quality automatically. In this paper, we propose an IQA method based on deep learning that does not require a reference image. We show that a convolutional neural network with distortion prediction and fixed filters improves the IQA accuracy.
Multi Model-Based Distillation for Sound Event Detection Open Access
Yingwei FU Kele XU Haibo MI Qiuqiang KONG Dezhi WANG Huaimin WANG Tie HONG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/07/08
Vol:
E102-D No:10
Page(s):
2055-2058
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
A Deep Learning Approach to Writer Identification Using Inertial Sensor Data of Air-Handwriting
Yanfang DING Yang XUE

LETTER-Pattern Recognition

Pubricized:
2019/07/18
Vol:
E102-D No:10
Page(s):
2059-2063
To the best of our knowledge, there are a few researches on air-handwriting character-level writer identification only employing acceleration and angular velocity data. In this paper, we propose a deep learning approach to writer identification only using inertial sensor data of air-handwriting. In particular, we separate different representations of degree of freedom (DoF) of air-handwriting to extract local dependency and interrelationship in different CNNs separately. Experiments on a public dataset achieve an average good performance without any extra hand-designed feature extractions.
Hardware-Based Principal Component Analysis for Hybrid Neural Network Trained by Particle Swarm Optimization on a Chip
Tuan Linh DANG Yukinobu HOSHINO

PAPER-Neural Networks and Bioengineering

Vol:
E102-A No:10
Page(s):
1374-1382
This paper presents a hybrid architecture for a neural network (NN) trained by a particle swarm optimization (PSO) algorithm. The NN is implemented on the hardware side while the PSO is executed by a processor on the software side. In addition, principal component analysis (PCA) is also applied to reduce correlated information. The PCA module is implemented in hardware by the SystemVerilog programming language to increase operating speed. Experimental results showed that the proposed architecture had been successfully implemented. In addition, the hardware-based NN trained by PSO (NN-PSO) program was faster than the software-based NN trained by the PSO program. The proposed NN-PSO with PCA also obtained better recognition rates than the NN-PSO without-PCA.
Low-Cost Method for Recognizing Table Tennis Activity
Se-Min LIM Jooyoung PARK Hyeong-Cheol OH

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/06/18
Vol:
E102-D No:10
Page(s):
2051-2054
This study designs a low-cost portable device that functions as a coaching assistant system which can support table tennis practice. Although deep learning technology is a promising solution to realizing human activity recognition, we propose using cosine similarity in making inferences. Our experiments show that the cosine similarity based inference can be a good alternative to the deep learning based inference for the assistant system when resources are limited.
Cross-Domain Deep Feature Combination for Bird Species Classification with Audio-Visual Data
Naranchimeg BOLD Chao ZHANG Takuya AKASHI

PAPER-Multimedia Pattern Processing

Pubricized:
2019/06/27
Vol:
E102-D No:10
Page(s):
2033-2042
In recent decade, many state-of-the-art algorithms on image classification as well as audio classification have achieved noticeable successes with the development of deep convolutional neural network (CNN). However, most of the works only exploit single type of training data. In this paper, we present a study on classifying bird species by exploiting the combination of both visual (images) and audio (sounds) data using CNN, which has been sparsely treated so far. Specifically, we propose CNN-based multimodal learning models in three types of fusion strategies (early, middle, late) to settle the issues of combining training data cross domains. The advantage of our proposed method lies on the fact that we can utilize CNN not only to extract features from image and audio data (spectrogram) but also to combine the features across modalities. In the experiment, we train and evaluate the network structure on a comprehensive CUB-200-2011 standard data set combing our originally collected audio data set with respect to the data species. We observe that a model which utilizes the combination of both data outperforms models trained with only an either type of data. We also show that transfer learning can significantly increase the classification performance.
LGCN: Learnable Gabor Convolution Network for Human Gender Recognition in the Wild Open Access
Peng CHEN Weijun LI Linjun SUN Xin NING Lina YU Liping ZHANG

LETTER-Image Recognition, Computer Vision

Pubricized:
2019/06/13
Vol:
E102-D No:10
Page(s):
2067-2071
Human gender recognition in the wild is a challenging task due to complex face variations, such as poses, lighting, occlusions, etc. In this letter, learnable Gabor convolutional network (LGCN), a new neural network computing framework for gender recognition was proposed. In LGCN, a learnable Gabor filter (LGF) is introduced and combined with the convolutional neural network (CNN). Specifically, the proposed framework is constructed by replacing some first layer convolutional kernels of a standard CNN with LGFs. Here, LGFs learn intrinsic parameters by using standard back propagation method, so that the values of those parameters are no longer fixed by experience as traditional methods, but can be modified by self-learning automatically. In addition, the performance of LGCN in gender recognition is further improved by applying a proposed feature combination strategy. The experimental results demonstrate that, compared to the standard CNNs with identical network architecture, our approach achieves better performance on three challenging public datasets without introducing any sacrifice in parameter size.
Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network
JianFeng WU HuiBin QIN YongZhu HUA LiHuan SHAO Ji HU ShengYing YANG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/07/02
Vol:
E102-D No:10
Page(s):
2047-2050
This paper proposes a deep neural network (DNN) based framework to address the problem of vector quantization (VQ) for high-dimensional data. The main challenge of applying DNN to VQ is how to reduce the binary coding error of the auto-encoder when the distribution of the coding units is far from binary. To address this problem, three fine-tuning methods have been adopted: 1) adding Gaussian noise to the input of the coding layer, 2) forcing the output of the coding layer to be binary, 3) adding a non-binary penalty term to the loss function. These fine-tuning methods have been extensively evaluated on quantizing speech magnitude spectra. The results demonstrated that each of the methods is useful for improving the coding performance. When implemented for quantizing 968-dimensional speech spectra using only 18-bit, the DNN-based VQ framework achieved an averaged PESQ of about 2.09, which is far beyond the capability of conventional VQ methods.
Character-Level Convolutional Neural Network for Predicting Severity of Software Vulnerability from Vulnerability Description
Shunta NAKAGAWA Tatsuya NAGAI Hideaki KANEHARA Keisuke FURUMOTO Makoto TAKITA Yoshiaki SHIRAISHI Takeshi TAKAHASHI Masami MOHRI Yasuhiro TAKANO Masakatu MORII

LETTER-Cybersecurity

Pubricized:
2019/06/21
Vol:
E102-D No:9
Page(s):
1679-1682
System administrators and security officials of an organization need to deal with vulnerable IT assets, especially those with severe vulnerabilities, to minimize the risk of these vulnerabilities being exploited. The Common Vulnerability Scoring System (CVSS) can be used as a means to calculate the severity score of vulnerabilities, but it currently requires human operators to choose input values. A word-level Convolutional Neural Network (CNN) has been proposed to estimate the input parameters of CVSS and derive the severity score of vulnerability notes, but its accuracy needs to be improved further. In this paper, we propose a character-level CNN for estimating the severity scores. Experiments show that the proposed scheme outperforms conventional one in terms of accuracy and how errors occur.
A New Method for Futures Price Trends Forecasting Based on BPNN and Structuring Data
Weijun LU Chao GENG Dunshan YU

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/05/28
Vol:
E102-D No:9
Page(s):
1882-1886
Forecasting commodity futures price is a challenging task. We present an algorithm to predict the trend of commodity futures price based on a type of structuring data and back propagation neural network. The random volatility of futures can be filtered out in the structuring data. Moreover, it is not restricted by the type of futures contract. Experiments show the algorithm can achieve 80% accuracy in predicting price trends.
Multi-Level Attention Based BLSTM Neural Network for Biomedical Event Extraction
Xinyu HE Lishuang LI Xingchen SONG Degen HUANG Fuji REN

PAPER-Natural Language Processing

Pubricized:
2019/04/26
Vol:
E102-D No:9
Page(s):
1842-1850
Biomedical event extraction is an important and challenging task in Information Extraction, which plays a key role for medicine research and disease prevention. Most of the existing event detection methods are based on shallow machine learning methods which mainly rely on domain knowledge and elaborately designed features. Another challenge is that some crucial information as well as the interactions among words or arguments may be ignored since most works treat words and sentences equally. Therefore, we employ a Bidirectional Long Short Term Memory (BLSTM) neural network for event extraction, which can skip handcrafted complex feature extraction. Furthermore, we propose a multi-level attention mechanism, including word level attention which determines the importance of words in a sentence, and the sentence level attention which determines the importance of relevant arguments. Finally, we train dependency word embeddings and add sentence vectors to enrich semantic information. The experimental results show that our model achieves an F-score of 59.61% on the commonly used dataset (MLEE) of biomedical event extraction, which outperforms other state-of-the-art methods.
MF-CNN: Traffic Flow Prediction Using Convolutional Neural Network and Multi-Features Fusion
Di YANG Songjiang LI Zhou PENG Peng WANG Junhui WANG Huamin YANG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2019/05/20
Vol:
E102-D No:8
Page(s):
1526-1536
Accurate traffic flow prediction is the precondition for many applications in Intelligent Transportation Systems, such as traffic control and route guidance. Traditional data driven traffic flow prediction models tend to ignore traffic self-features (e.g., periodicities), and commonly suffer from the shifts brought by various complex factors (e.g., weather and holidays). These would reduce the precision and robustness of the prediction models. To tackle this problem, in this paper, we propose a CNN-based multi-feature predictive model (MF-CNN) that collectively predicts network-scale traffic flow with multiple spatiotemporal features and external factors (weather and holidays). Specifically, we classify traffic self-features into temporal continuity as short-term feature, daily periodicity and weekly periodicity as long-term features, then map them to three two-dimensional spaces, which each one is composed of time and space, represented by two-dimensional matrices. The high-level spatiotemporal features learned by CNNs from the matrices with different time lags are further fused with external factors by a logistic regression layer to derive the final prediction. Experimental results indicate that the MF-CNN model considering multi-features improves the predictive performance compared to five baseline models, and achieves the trade-off between accuracy and efficiency.
TDCTFIC: A Novel Recommendation Framework Fusing Temporal Dynamics, CNN-Based Text Features and Item Correlation
Meng Ting XIONG Yong FENG Ting WU Jia Xing SHANG Bao Hua QIANG Ya Nan WANG

PAPER-Data Engineering, Web Information Systems

Pubricized:
2019/05/14
Vol:
E102-D No:8
Page(s):
1517-1525
The traditional recommendation system (RS) can learn the potential personal preferences of users and potential attribute characteristics of items through the rating records between users and items to make recommendations.However, for the new items with no historical rating records,the traditional RS usually suffers from the typical cold start problem. Additional auxiliary information has usually been used in the item cold start recommendation,we further bring temporal dynamics,text and relevance in our models to release item cold start.Two new cold start recommendation models TmTx(Time,Text) and TmTI(Time,Text,Item correlation) proposed to solve the item cold start problem for different cold start scenarios.While well-known methods like TimeSVD++ and CoFactor partially take temporal dynamics,comments,and item correlations into consideration to solve the cold start problem but none of them combines these information together.Two models proposed in this paper fused features such as time,text,and relevance can effectively improve the performance under item cold start.We select the convolutional neural network (CNN) to extract features from item description text which provides the model the ability to deal with cold start items.Both proposed models can effectively improve the performance with item cold start.Experimental results on three real-world data set show that our proposed models lead to significant improvement compared with the baseline methods.
Speech Quality Enhancement for In-Ear Microphone Based on Neural Network
Hochong PARK Yong-Shik SHIN Seong-Hyeon SHIN

LETTER-Speech and Hearing

Pubricized:
2019/05/15
Vol:
E102-D No:8
Page(s):
1594-1597
Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.
Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech
Kentaro SONE Toru NAKASHIKA

PAPER-Speech and Hearing

Pubricized:
2019/05/15
Vol:
E102-D No:8
Page(s):
1546-1553
Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.
Recognition of Anomalously Deformed Kana Sequences in Japanese Historical Documents
Nam Tuan LY Kha Cong NGUYEN Cuong Tuan NGUYEN Masaki NAKAGAWA

PAPER-Image Recognition, Computer Vision

Pubricized:
2019/05/07
Vol:
E102-D No:8
Page(s):
1554-1564
This paper presents recognition of anomalously deformed Kana sequences in Japanese historical documents, for which a contest was held by IEICE PRMU 2017. The contest was divided into three levels in accordance with the number of characters to be recognized: level 1: single characters, level 2: sequences of three vertically written Kana characters, and level 3: unrestricted sets of characters composed of three or more characters possibly in multiple lines. This paper focuses on the methods for levels 2 and 3 that won the contest. We basically follow the segmentation-free approach and employ the hierarchy of a Convolutional Neural Network (CNN) for feature extraction, Bidirectional Long Short-Term Memory (BLSTM) for frame prediction, and Connectionist Temporal Classification (CTC) for text recognition, which is named a Deep Convolutional Recurrent Network (DCRN). We compare the pretrained CNN approach and the end-to-end approach with more detailed variations for level 2. Then, we propose a method of vertical text line segmentation and multiple line concatenation before applying DCRN for level 3. We also examine a two-dimensional BLSTM (2DBLSTM) based method for level 3. We present the evaluation of the best methods by cross validation. We achieved an accuracy of 89.10% for the three-Kana-character sequence recognition and an accuracy of 87.70% for the unrestricted Kana recognition without employing linguistic context. These results prove the performances of the proposed models on the level 2 and 3 tasks.

221-240hit(879hit)

Keyword Search Result

[Keyword] neural net(879hit)

A Spectral Clustering Based Filter-Level Pruning Method for Convolutional Neural Networks

SDChannelNets: Extremely Small and Efficient Convolutional Neural Networks

Acoustic Design Support System of Compact Enclosure for Smartphone Using Deep Neural Network

High Noise Tolerant R-Peak Detection Method Based on Deep Convolution Neural Network

Discriminative Convolutional Neural Network for Image Quality Assessment with Fixed Convolution Filters

Multi Model-Based Distillation for Sound Event Detection Open Access

A Deep Learning Approach to Writer Identification Using Inertial Sensor Data of Air-Handwriting

Hardware-Based Principal Component Analysis for Hybrid Neural Network Trained by Particle Swarm Optimization on a Chip

Low-Cost Method for Recognizing Table Tennis Activity

Cross-Domain Deep Feature Combination for Bird Species Classification with Audio-Visual Data

LGCN: Learnable Gabor Convolution Network for Human Gender Recognition in the Wild Open Access

Vector Quantization of High-Dimensional Speech Spectra Using Deep Neural Network

Character-Level Convolutional Neural Network for Predicting Severity of Software Vulnerability from Vulnerability Description

A New Method for Futures Price Trends Forecasting Based on BPNN and Structuring Data

Multi-Level Attention Based BLSTM Neural Network for Biomedical Event Extraction

MF-CNN: Traffic Flow Prediction Using Convolutional Neural Network and Multi-Features Fusion

TDCTFIC: A Novel Recommendation Framework Fusing Temporal Dynamics, CNN-Based Text Features and Item Correlation

Speech Quality Enhancement for In-Ear Microphone Based on Neural Network

Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech

Recognition of Anomalously Deformed Kana Sequences in Japanese Historical Documents

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles